Digital Preservation of Geospatial Data
نویسندگان
چکیده
The selection, acquisition, and management of digital data are now part and parcel of the work librarians handle on a day-to-day basis. While much thought goes into this work, little consideration may be given to the long-term preservation of the collected data. Digital data cannot be retained for the future in the same way paper-based materials have traditionally been handled. Specifi c issues arise when archiving digital data and especially geospatial data. This article will discuss some of those issues, including data versioning, fi le size, proprietary data formats, copyright, and the complexity of fi le formats. Collection development topics, including what to collect and why, will also be explored. The work underlying this article is being done as part of an award from the Library of Congress’s National Digital Information Infrastructure and Preservation Program (NDIIPP). Introduction Digital geospatial data is now routinely found in libraries that carry cartographic data, geologic information, social science datasets, and other materials in support of disciplines using Geographic Information Systems (GIS) in their research and work. Over the course of years, the data have been received on fl oppy disks, CD-ROMS, DVDs, and hard drives or are available for free or for a fee over the Internet. In the paper world, ensuring longevity of items means creating ideal conditions in which to store collections. Materials will last longer if kept in a cool space without much light and correct humidity and handled as seldom as possible. Digital Preservation of Geospatial Data Julie Sweetkind-Singer, Mary Lynette Larsgaard, and Tracey Erwin LIBRARY TRENDS, Vol. 55, No. 2, Fall 2006 (“Geographic Information Systems and Libraries,” edited by Jaime Stoltenberg and Abraham Parrish), pp. 304–314 © 2006 The Board of Trustees, University of Illinois The same is not true for digital data. As Clay Shirky (of New York University’s Interactive Telecommunications Program) pointed out in July 2005 at the bi-annual meeting of the National Digital Information Infrastructure and Preservation Program (NDIIPP), digital materials must be touched and manipulated on a regular basis if they are to survive. Leaving digital data alone will certainly cause it to be lost, and the time frame may be surprisingly short. Technology is changing at such a rapid pace that it can now be a challenge to fi nd a machine that will read fl oppy discs, much less the obsolete program on which the data was supposed to run. Web sites can be and are removed at a moment’s notice. This is especially frustrating for the federal depository libraries that formerly received paper copies of government information now available only in digital formats. Clearly, librarians must begin thinking about long-term preservation of their digital collection, from what to collect to ensuring that it is preserved with the same thoughtfulness and care that is given to hardcopy materials. The Library of Congress and the NDIIPP Awards In December 2000 Congress appropriated nearly $100 million dollars in funds to underwrite the cost of studying the issues related to the longterm preservation of digital data. The program was to be administered by the Library of Congress and was named the National Digital Information and Infrastructure Preservation Program (Library of Congress, 2006a). Conference Report H. Rept. 106–1033 stated that The overall plan should set forth a strategy for the Library of Congress, in collaboration with other Federal and non-Federal entities, to identify a national network of libraries and other organizations with responsibilities for collecting digital materials that will provide access to and maintain those materials. . . . In addition to developing this strategy, the plan shall set forth, in concert with the Copyright Offi ce, the policies, protocols, and strategies for the long-term preservation of such materials, including the technological infrastructure required at the Library of Congress. (Library of Congress, 2006b) The goal of the program was to create a network of committed partners willing to work on the policies, protocols, and architectures needed to build a series of archives to house digital materials. The fi rst round of major funding was announced in September 2004 with eight projects receiving a total of $13.8 million dollars in funding over a three-year period. Two of these projects focused specifi cally on geospatial data. The North Carolina State University Libraries partnered with the North Carolina Center for Geographic Information and Analysis to create a model for archiving the local and state government output of digital geospatial resources, including digitized maps. The project is designed to be a demonstration project for other states. The second contract was given jointly to the University of California at Santa Barbara 305 sweetkind-singer et al/geospatial data 306 library trends/fall 2006 (UCSB) and Stanford University to underwrite the creation of the National Geospatial Digital Archive (NGDA). The NGDA’s goal is to design repository infrastructures at each university and to collect materials across a broad spectrum of geographic formats. The team will work to expand the network of organizations committed to preserving geospatial content (Library of Congress, 2004). The NGDA Project The NGDA project has both research and development components. Research topics include considerations for long-term preservation; collection development, including prioritization and scope; architectural and economic models; rights issues; and best practices. The two libraries are developing prototype archives for housing the data and jointly creating a geospatial format registry to describe the data being stored. During the second year of the grant the two archives will be federated using the Alexandria Digital Library (ADL) software interface (see Figure 1). Technical Architectures The two repositories are being built using similar technologies while at the same time meeting the specifi c needs of each institution. Both architectures contain standards-based interfaces, clearly defi ned metadata formats, an underlying format registry, a goal of end-to-end automation of the systems, and exploration into open source front ends. UCSB has developed a repository specifi cally to house geospatial information, with tools and templates designed around common data structures. Stanford is building a repository to hold all of its digital content no matter what its nature; the goal is to determine if a general digital repository can adequately handle the complexities of geospatial data formats using standard metadata and a content transfer manifest, which include provisions for geospatial information. As of the end of December 2005, both repositories were complete through their fi rst stages and had ingested geospatial data. Format Registries Technically, geospatial data is more complex than standard digital formats. This must be accounted for when archiving the data. In order to preserve a data format, information about that format must be known. The archive has to have an automated way to understand the fi le it has received and to verify that it is what it purports to be. This format information is typically stored in a registry, which records detailed metadata about the types of fi les. For example, format information for a GeoTIFF would include specifi cations for the correct TIFF standard and explanations of any accompanying fi les, such as those containing projection information. The format registry can be as complex as a custom-made database or as simple as a Web page or text document.
منابع مشابه
Curation and Preservation of Complex Data: The North Carolina Geospatial Data Archiving Project
The North Carolina Geospatial Data Archiving Project (NCGDAP) is a three-year joint effort of the North Carolina State University Libraries and the North Carolina Center for Geographic Information and Analysis focused on collection and preservation of digital geospatial data resources from state and local government agencies. NCGDAP is being undertaken in partnership with the Library of Congres...
متن کاملAdvancing Geospatial Data Curation*
Digital curation is a new term that encompasses ideas from established disciplines: it defines a set of activities to manage and improve the transfer of the increasing volume of data products from producers of digital scientific and academic data to consumers, both now and in the future. Research topics in this new area are in a formative stage, but a variety of work that can serve to advance t...
متن کاملPreserving Geospatial Data: The National Geospatial Digital Archive’s Approach
The National Geospatial Digital Archive (NGDA) is one of eight initial projects funded by the Library of Congress’s National Digital Information Infrastructure and Preservation Program (NDIIPP). The project’s overarching goal is to answer the question: How can we preserve geospatial data on a national scale and make it available to future generations? This paper summarizes the project’s work in...
متن کاملA Partnership Framework for Geospatial Data Preservation in North Carolina
The North Carolina Geospatial Data Archiving Project (NCGDAP) is a joint project of the NC State University Libraries and the NC Center for Geographic Information and Analysis focusing on collection and preservation of state and local agency digital geospatial data resources. The project is being carried out in collaboration with the Library of Congress under the National Digital Information In...
متن کاملThe National Geospatial Digital Archives - Collection Development: Lessons Learned
There are many similarities between building a geospatial digital archive and building a hard-copy map collection, and two major ones are the necessity to have a collection development policy and the amount of hard work required to seek out and acquire the resources. Two institutions, University of California at Santa Barbara and Stanford University, the initial partners in the National Geospat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Library Trends
دوره 55 شماره
صفحات -
تاریخ انتشار 2006